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ABSTRACT 

InnateDB (http://www.innatedb.com) is an integra- 
ted analysis platform that has been specifically 
designed to facilitate systems-level analyses of 
mammalian innate immunity networks, pathways 
and genes. In this article, we provide details of 
recent updates and improvements to the database. 
InnateDB now contains > 196 000 human, mouse 
and bovine experimentally validated molecular 
interactions and 3000 pathway annotations of 
relevance to all mammalian cellular systems (i.e. 
not just immune relevant pathways and inter- 
actions). In addition, the InnateDB team has, to 
date, manually curated in excess of 18000 molecular 
interactions of relevance to innate immunity, 
providing unprecedented insight into innate 
immunity networks, pathways and their component 
molecules. More recently, InnateDB has also 
initiated the curation of allergy- and asthma-related 
interactions. Furthermore, we report a range of im- 
provements to our integrated bioinformatics solu- 
tions including web service access to InnateDB 
interaction data using Proteomics Standards 
Initiative Common Query Interface, enhanced Gene 
Ontology analysis for innate immunity, and the avail- 
ability of new network visualizations tools. Finally, 
the recent integration of bovine data makes 
InnateDB the first integrated network analysis 
platform for this agriculturally important model 
organism. 



INTRODUCTION 

The innate immune response is a critical branch of 
immunity, which not only provides a first line of defence 
against pathogens, but also regulates and shapes subse- 
quent adaptive responses. Innate immunity, however, 
can also do great harm by driving inappropriate inflam- 
matory cascades. Therefore complex molecular networks 
are required to regulate innate immunity and maintain 
appropriate and specific responses to different pathogens, 
while limiting potential harm from dysregulated 
inflammation (1-5). The intricate interplay of a multitude 
of regulatory layers that initiate and coordinate the innate 
immune response has led to an ever-increasing interest in 
applying systems-oriented approaches to better under- 
stand innate immunity and its modulators (6). 

InnateDB (publicly accessible at http://www.innatedb. 
com and mirrored at http://innatedb.teagasc.ie) is a know- 
ledge base and analysis platform that was specifically 
designed to provide a system-oriented yet user-friendly 
tool for integrative analyses of the mammalian innate 
immune response (7). 

INNATEDB CURATION 

A key component of the InnateDB project is the context- 
ual manual curation of innate immunity interactions, 
pathways and their component molecules. The curation 
process has previously been described in detail (8). 
InnateDB was first publicly released in 2008 (7). At that 
time, ~3500 molecular interactions had been curated. By 
2010, the database contained 11 786 InnateDB-curated 
molecular interactions from the review of >3000 pub- 
lished articles. As of September 2012, our curation team 
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has reviewed >4300 publications, and > 18 000 inter- 
actions of relevance to innate immunity have been 
annotated (Figure 1; for detailed statistics see http:// 
www.innatedb.com/statistics). More recently, as part of 
the AllerGen Networks of Centres of Excellence (NCE) 
(http://www.allergen-nce.com), InnateDB curators have 
also begun to annotate interactions and pathways of rele- 
vance to allergy and asthma. All interactions in InnateDB 
are provided with detailed contextual information accord- 
ing to the minimum information required for reporting a 
molecular interaction experiment (MIMIx) standards (9), 
including the evidence supporting each interaction, the 
tissue or cell type the interaction was reported in, the 
type of interaction and the method of detection. New 
interactions are added to the database weekly, providing 
up-to-date annotation on the innate immunity 
interactome. This resource can be mined to identify new 
relationships between innate immunity and other 
processes, to identify potential novel regulators of innate 
immunity and to interpret a user's own data (e.g. gene 
expression data) from a network biology perspective. 

Building a comprehensive list of innate immunity genes 

Aside from annotating molecular interactions, InnateDB 
now also annotates which genes have a published role in 
innate immunity, providing a brief summary of that role 



and links to the relevant publications. This data set, avail- 
able at http://www.innatedb.com/curatedGenes, presently 
contains > 1 500 genes (957 human, 527 mouse and 46 
bovine) and is the most comprehensive list of genes 
involved in innate immunity that is available. This list 
was recently used by a group of researchers to show that 
human proteins, which are targeted by viruses, are highly 
enriched for having a role in innate immunity (10). 

Contribution to the International Molecular Exchange 
Consortium 

In 2010, InnateDB became a member of the International 
Molecular Exchange Consortium (IMEx) (11). This orga- 
nization is dedicated to developing rules for describing 
molecular interaction data, actively curating these inter- 
actions from the scientific literature and making them 
available through a common website. 

Within IMEx, InnateDB has committed to curating 
every issue of Nature Immunology from September 2010 
onwards using IMEx curation standards (12). Because 
IMEx curation requires more annotation detail than the 
MIMIx level (9) currently supported by InnateDB's sub- 
mission system, InnateDB curators are submitting these 
IMEx interactions through the IntAct interaction 
database (13). On submission, each IMEx interaction is 
thoroughly reviewed by an IntAct curator before it is 




Figure 1. The InnateDB curated interactome in July 2012. Red edges represent interactions that have been added in 2011 and 2012. 
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accepted and released. In addition to submitting to IntAct, 
all InnateDB acceptable interactions (i.e. interactions of 
relevance to innate immunity) from Nature Immunology 
are also deposited into InnateDB. 

Integrating data from external resources 

To supplement our manual curation efforts and to 
provide a snapshot of the entire interactome beyond 
known innate immunity interactions, InnateDB imports 
data from a wide range of genome, interaction and 
pathway databases (http://www.innatedb.com/resources). 
Currently, InnateDB contains 178 000+ imported experi- 
mentally validated interactions, 3000+ pathways and 
300000+ interactions based on Ortholuge (14) orthology 
predictions (interologs) in addition to the 18 000+ 
InnateDB manually curated interactions. 



INTEGRATION OF BOVINE DATA-ORTHOLOGY- 
BASED PATHWAY & NETWORK 
RECONSTRUCTION 

In February 2012, a new version of InnateDB was released 
that included the incorporation of bovine gene, pathway 
and molecular interaction annotation in addition to the 
existing data for human and mouse. This new version of 
the platform now also facilitates a systems biology 
approach to the investigation of the bovine innate 
immune response and is poised to deepen our understand- 
ing of important bovine infectious diseases associated with 
significant economic losses (e.g. bovine tuberculosis and 
mastitis), as well as enabling cross-species comparisons of 
innate immunity. 

As bovine experimentally validated interactions and 
pathways are virtually non-existent, InnateDB uses an 
orthology-based approach to predict bovine pathways 
and interactions primarily from human data. One should 
be aware that this approach results in a humanized and 
frequently incomplete representation of the bovine 
interactome, but in the absence of widespread experimen- 
tal data it provides at least a network biology framework 
to build on and to generate hypotheses that can be 
subsequently experimentally validated. InnateDB 
experimentally validated and predicted interactions are 
clearly labelled. As of September 2012, InnateDB 
contains > 70 000 bovine interologs (interactions based 
on orthology) involving 10 717 bovine genes. In each 
case, one can link back to the orthologous human inter- 
action to review evidence for the interaction. 

The latest release of InnateDB also uses orthology pre- 
dictions to transfer human and mouse pathway annota- 
tions to bovine genes in real time. Currently, pathway 
annotations can be assigned to 7032 bovine genes by 
orthology to human genes. Notably, although only 
~70% of all human genes (14 316 genes) have a predicted 
bovine ortholog, and a significantly higher proportion 
(85%) of human genes with pathway annotations have a 
bovine ortholog. This higher prevalence of conserved 
genes among pathway-annotated genes indicates that 
many of the associated processes may be well preserved. 



To further examine the appropriateness of the 
orthology-based annotation transfer on a per-pathway 
basis, we determined the 'conservation rate' (cons) of 
each pathway, defined as the ratio of pathway participants 
in the source organism (human/mouse) that have a 
putative counterpart in the target organism (cow) to the 
total number of participants in the source organism. As of 
September 2012, InnateDB contains 1536 human 
pathways with five or more pathway participants, 80% 
(1257 pathways) of these have a conservation rate of 0.8 
or better. The corresponding number for a conservation 
rate of >0.7 is 93% (1442 pathways). The high prevalence 
of strongly conserved pathways seems to largely justify an 
orthology-based approach for inferring bovine pathways. 
Supplementary Table SI lists the remaining 107 pathways 
with a relatively low conservation rate (cons <0.7). 
Notably, the list of pathways for which an 
orthology-based reconstruction is challenging includes 
>30 immunologically important pathways. In some 
cases, the low conservation rate can be attributed to real 
divergence of the underlying processes. The bovine Type I 
Interferon family, for example, has been shown to have 
undergone widespread expansion, including the diver- 
gence of a new Type I interferon (IFN) family (IFNX) 
in the cow from IFN alpha (15). In other cases, the con- 
servation rate might further increase with future improve- 
ments to the quality of the bovine draft genome. 

In addition to orthology-based annotation transfer, the 
tissue expression profile of a gene can provide some insight 
into its potential function (16). Through collaboration 
with colleagues at the United States Department of 
Agriculture, InnateDB now integrates bovine tissue ex- 
pression data for > 13 000 genes. This data was sourced 
from the Bovine Gene Atlas (17), which has profiled gene 
expression across 87 different bovine tissues using a next 
generation sequencing approach. A graphical tissue ex- 
pression profile is available on the Gene Card page of 
bovine genes. 



INNATEDB DATA ANALYSIS AND VISUALIZATION 

InnateDB can serve as a knowledge base where users can 
search for particular genes or proteins of interest and their 
associated interactions and pathways, using a variety of 
search fields. Alternatively, InnateDB can be queried in a 
more high-throughput fashion, where users can upload a 
list of genes/proteins and associated quantitative data 
from up to 10 different conditions (e.g. gene expression 
data) and carry out more complex searches and analyses 
(Figure 2). After uploading a list of gene IDs (human, 
mouse and bovine Ensembl, RefSeq, Entrez Gene, 
UniProt and InnateDB IDs are all accepted), users can 
quickly find which pathways, Gene Ontologies (including 
enhanced innate immunity gene annotation) or tran- 
scription factor binding sites are statistically over- 
represented in their dataset. Users can also use 
InnateDB to build, visualize and analyse molecular inter- 
action networks consisting of their uploaded genes and 
their encoded products. One can, for example, construct 
a network of how differentially expressed genes interact 
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Figure 2. Data analysis workflow in InnateDB. 
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with one another. Quantitative data uploaded by the user 
is automatically overlaid on these networks. Recent im- 
provements to InnateDB include the option to incorporate 
interactions based on orthology in the construction of mo- 
lecular interaction networks and the option to restrict the 
networks to contain only InnateDB manually curated 
interactions. Further enhancements to the web-interface 
include more intuitive page layouts, faster searches and 
analyses, and a variety of other changes (see http://www. 
innatedb.com/news). 

Network visualization tools 

All interactions in InnateDB may be downloaded in 
several standardized formats including text-based 
formats (tab, csv, xls), the simple interaction format (sif) 
and the PSI-MI XML 2.5 and MITAB formats (18). 
Additionally, interaction networks may also be visualized 
in our Cerebral program (19), a Java plugin for the 
Cytoscape network visualization software (20,21), which 
uses subcellular localization information to orientate 
interaction networks in a more biologically intuitive 
pathway-like layout. Networks can also be visualized in 
other third-party software including the CyOog plugin 



(22), which uses Power Graph analysis to reduce 
network complexity by explicitly representing re-occurring 
network motifs. Recently, we have also integrated 
BioLayout Express 3D 2.2 (23), an application designed 
for the visualization, clustering and analysis of large 
networks in 2D and 3D space. 

Proteomics Standards Initiative Common Query Interface 
implementation 

Interaction data in InnateDB can now also be queried 
using web services implementing The Proteomics Stan- 
dards Initiative Common Query Interface (PSICQUIC) 
(24). PSICQUIC is an effort from the Human Proteome 
Organization Proteomics Standards Initiative (http:// 
www.hupo.org/research/psi/) to standardize program- 
matic access to molecular interaction databases based on 
the PSI standard formats (PSI-MI XML and MITAB) 
(18). It defines standard web services and also a query 
syntax for powerful and flexible searches. 

All data sources implementing PSICQUIC can be 
queried in the exact same way, i.e. the same query can 
be used to retrieve the relevant data from many different 
interaction data sources. Independently published 
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observations of an experimental system, curated by inde- 
pendent databases, are then integrated in response to a 
single user query (see http://www.ebi.ac.uk/intact/imex). 
PSICQUIC web services are RESTful (REpresentational 
State Transfer) but can also be accessed through SOAP 
(Simple Object Access Protocol). A list of available 
services for InnateDB can be found at http://imex. 
innatedb.com/psicquic-ws/webservices. InnateDB 
updates the data files for the PSICQUIC web services 
weekly and additionally provides them for download in 
a compressed format at http://www.innatedb.com/ 
downloads. 



ONGOING DEVELOPMENTS 

InnateDB will maintain its curation efforts to annotate 
interactions and genes of relevance to innate immunity, 
with weekly updated annotation, thus continuing to 
provide a comprehensive platform for systems and 
network biology analyses of innate immune-associated re- 
sponses. Continued incorporation of data from external 
resources, encompassing the wider human, mouse and 
bovine interactomes, will also continue to facilitate 
analyses beyond innate immunity by a wide range of re- 
searchers. Additionally, InnateDB intends to expand 
beyond the curation of innate immunity relevant 
networks, incorporating more adaptive immunity infor- 
mation. We are currently developing a first version of an 
Allergy and Asthma Portal that will further integrate data 
on allergy and associated immune interactions from both 
the literature and researchers from AllerGen. This portal 
will be built on InnateDB and will provide an analysis 
platform for more sophisticated network biology-based 
investigations of allergy and asthma responses. These 
interactions will be identifiable from innate immunity 
interactions, so that users can continue to have focused 
analyses on the innate immunity interactome. 

Further future developments will include improvements 
to InnateDB pathway analysis tools. The over- 
representation-based methods for pathway analysis that 
are currently available through InnateDB's data analysis 
interface are widely established and considered a 'gold 
standard'; yet, they neglect the fact that many components 
are shared between seemingly unrelated pathways. To 
address this issue, we have used the InnateDB collection 
of pathway annotations as a basis to identify pairs of 
genes that co-occur only in a single pathway and de- 
veloped a novel pathway analysis method [signature 
over-representation analysis (SIGORA)] that focuses on 
the over-representation of such gene pairs in a list of 
genes of interest (Foroushani et al, submitted). SIGORA 
is currently implemented as an R package [available from 
the Comprehensive R Archive Network (CRAN) at http:// 
cran.r-project.org/web/packages/sigora/index.html] and 
will be integrated into the future releases of InnateDB. 

Finally, together with the PSICQUIC development 
team and other IMEx members, we are working on an 
improved reference implementation of PSICQUIC. We 
are also preparing to export our data in MITAB 2.7 
format. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Table 1. 
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